Skip to content

[v1] Refactor KVCacheConfig #14079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 21, 2025
Merged

Conversation

heheda12345
Copy link
Collaborator

@heheda12345 heheda12345 commented Mar 1, 2025

This PR makes the following changes to KVCacheConfig

  1. change the meaning of KVCacheSpec from the spec of the model to the spec of one layer
  2. KVCacheConfig class:
    2. save the kv_cache_spec for each kv cache group instead of each layer
  3. The logic to make the same KVCacheConfig for all workers (EngineCore._initialize_kv_caches). Change it into 3 steps:
    1. get the available memory of each worker independently
    2. get the kv_cache_config for each worker independently
    3. adjust the kv_cache_config for all workers to make them the same, including assigning the same num_blocks and the same KVCacheGroupSpec (make_kv_cache_configs_consistent)

Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Copy link

github-actions bot commented Mar 1, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments. In general LGTM!

@@ -87,6 +84,18 @@ class KVCacheTensor:
size: int # The size of KV cache Tensor in bytes


@dataclass
class VirtualLayer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call this VirtualLayer? I believe woosuk and you had some discussions among this. The issue of VirtualLayer is that you can't tell that it's related to KV cache and KV cache grouping. To me I feel like maybe some names like "GroupedLayerKV" will be more straightforward.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussing in #hybrid-mem channel of slack.

Copy link

mergify bot commented Mar 9, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 9, 2025
@mergify mergify bot removed the needs-rebase label Mar 9, 2025
Copy link
Collaborator Author

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhuohan123 Updated the PR based on your review. More discussion on the name of VirtualLayer is needed.

@@ -87,6 +84,18 @@ class KVCacheTensor:
size: int # The size of KV cache Tensor in bytes


@dataclass
class VirtualLayer:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussing in #hybrid-mem channel of slack.

Signed-off-by: Chen Zhang <[email protected]>
raise NotImplementedError


def make_kv_cache_configs_consistent(kv_cache_configs: list[KVCacheConfig]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks more like unify_kv_cache_configs?

# Change the num_blocks of each rank to the smallest among all ranks. We
# do not need to shrink the tensor size because it is valid to only use the
# first `num_blocks` blocks of the tensor.
num_blocks = min(kv_cache_config.num_blocks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
num_blocks = min(kv_cache_config.num_blocks
min_num_blocks = min(kv_cache_config.num_blocks

dtype=dtype,
device=self.device)
else:
raise NotImplementedError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO

Copy link

mergify bot commented Mar 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 12, 2025
@mergify mergify bot removed the needs-rebase label Mar 18, 2025
@heheda12345
Copy link
Collaborator Author

@zhuohan123 @comaniac Updated the PR based on your comments. Can you review it again?

Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Approve to unblock. Left to @WoosukKwon and @zhuohan123

@comaniac comaniac added ready ONLY add when PR is ready to merge/full CI is needed force-merge labels Mar 20, 2025
@DarkLight1337
Copy link
Member

Please fix the merge conflict

Copy link

mergify bot commented Mar 21, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @heheda12345.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 21, 2025
@mergify mergify bot removed the needs-rebase label Mar 21, 2025
@vllm-bot vllm-bot merged commit 93a00d7 into vllm-project:main Mar 21, 2025
30 of 32 checks passed
erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025
lengrongfu pushed a commit to lengrongfu/vllm that referenced this pull request Apr 2, 2025
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Louis Ulmer <[email protected]>
nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
force-merge ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants